Binaural Speech Separation Using Recurrent Timing Neural Networks for Joint F0-Localisation Estimation
نویسندگان
چکیده
A speech separation system is described in which sources are represented in a joint interaural time difference-fundamental frequency (ITD-F0) cue space. Traditionally, recurrent timing neural networks (RTNNs) have been used only to extract periodicity information; in this study, this type of network is extended in two ways. Firstly, a coincidence detector layer is introduced, each node of which is tuned to a particular ITD; secondly, the RTNN is extended to become twodimensional to allow periodicity analysis to be performed at each bestITD. Thus, one axis of the RTNN represents F0 and the other ITD allowing sources to be segregated on the basis of their separation in ITD-F0 space. Source segregation is performed within individual frequency channels without recourse to across-channel estimates of F0 or ITD that are commonly used in auditory scene analysis approaches. The system is evaluated on spatialised speech signals using energy-based metrics and automatic speech recognition.
منابع مشابه
Recurrent Timing Neural Networks for Joint F0-Localisation Estimation
A novel extension to recurrent timing neural networks (RTNNs) is proposed which allows such networks to exploit a joint interaural time difference-fundamental frequency (ITD-F0) auditory cue as opposed to F0 only. This extension involves coupling a second layer of coincidence detectors to a two-dimensional RTNN. The coincidence detectors are tuned to particular ITDs and each feeds excitation to...
متن کاملSinging Voice Separation Using Deep Neural Networks and F0 Estimation
Deep Neural Networks (DNN) have become a popular approach for speech enhancement, and singing voice separation. DNNs are typically trained to estimate a timefrequency mask using ground truth examples. In this submission, we combine DNN estimation as a first step with traditional refinement via F0 estimation, using the YINFFT algorithm.
متن کاملNeural networks for speech separation for binaural hearing aids
This paper deals with the use of neural networks for separating speech from other noisy sources in binaural hearing aids. In sound separation systems implemented in binaural hearing aids, the right and left hearing aids need to transmit to each other some parameters involved in the speech separation algorithm. The problem is that this transmission reduces the battery life, which is one of the m...
متن کاملBinaural Reverberant Speech Separation Based on Deep Neural Networks
Supervised learning has exhibited great potential for speech separation in recent years. In this paper, we focus on separating target speech in reverberant conditions from binaural inputs using supervised learning. Specifically, deep neural network (DNN) is constructed to map from both spectral and spatial features to a training target. For spectral features extraction, we first convert binaura...
متن کاملLocalization based stereo speech source separation using probabilistic time-frequency masking and deep neural networks
Time-frequency (T-F) masking is an effective method for stereo speech source separation. However, reliable estimation of the T-F mask from sound mixtures is a challenging task, especially when room reverberations are present in the mixtures. In this paper, we propose a new stereo speech separation system where deep neural networks are used to generate soft T-F mask for separation. More specific...
متن کامل